Post - Clustering Soft Vector Quantization with Inverse Power - Function Distribution , and Application on Discrete HMM - Based Machine Learning
نویسندگان
چکیده
In this paper, we introduce a soft vector quantization scheme with inverse power-function distribution, and analytically derive an upper bound of the resulting quantization noise energy in comparison to that of typical (hard-deciding) vector quantization. We also discuss the positive impact of this kind of soft vector quantization on the performance of machine-learning systems that include one or more vector quantization modules. Moreover, we provide experimental evidence on the advantage of avoiding over-fitting and boosting the robustness of such systems in the presence of considerable parasitic variance; e.g. noise, in the runtime inputs. The experiments have been conducted with two versions of one of the best reported discrete HMM-based Arabic OCR systems; one version deploying hard vector quantization and the other deploying our herein presented soft vector quantization. Test samples of real-life scanned pages are used to challenge both versions; hence the recognition error margins are compared.
منابع مشابه
Relevance-Vector-Machine Quantization and Density-Function Estimation: Application to HMM-Based Multi-Aspect Target Classification
The relevance vector machine (RVM) is applied for feature-vector quantization (codebook design) and for density-function estimation in high-dimensional feature space. The RVM represents a Bayesian extension of the widely applied support vector machine (SVM). The use of RVMs for quantization and density-function estimation is explored with application to discrete and continuous HMMs, respectivel...
متن کاملConnectionist Probability Estimators in Hmm Using Genetic Clustering Application for Speech Recognition and Medical Diagnosis
The main goal of this paper is to compare the performance which can be achieved by five different approaches analyzing their applications’ potentiality on real world paradigms. We compare the performance obtained with (1) Multi-network RBF/LVQ structure (2) Discrete Hidden Markov Models (HMM) (3) Hybrid HMM/MLP system using a Multi LayerPerceptron (MLP) to estimate the HMM emission probabilitie...
متن کاملA New Vector Quantization Front-End Process for Discrete HMM Speech Recognition System
The paper presents a complete discrete statistical framework, based on a novel vector quantization (VQ) front-end process. This new VQ approach performs an optimal distribution of VQ codebook components on HMM states. This technique that we named the distributed vector quantization (DVQ) of hidden Markov models, succeeds in unifying acoustic micro-structure and phonetic macro-structure, when th...
متن کاملEntropy-based Consensus for Distributed Data Clustering
The increasingly larger scale of available data and the more restrictive concerns on their privacy are some of the challenging aspects of data mining today. In this paper, Entropy-based Consensus on Cluster Centers (EC3) is introduced for clustering in distributed systems with a consideration for confidentiality of data; i.e. it is the negotiations among local cluster centers that are used in t...
متن کاملNGTSOM: A Novel Data Clustering Algorithm Based on Game Theoretic and Self- Organizing Map
Identifying clusters is an important aspect of data analysis. This paper proposes a noveldata clustering algorithm to increase the clustering accuracy. A novel game theoretic self-organizingmap (NGTSOM ) and neural gas (NG) are used in combination with Competitive Hebbian Learning(CHL) to improve the quality of the map and provide a better vector quantization (VQ) for clusteringdata. Different ...
متن کامل